51 research outputs found

    Greedy structure learning from data that contains systematic missing values

    Get PDF
    Learning from data that contain missing values represents a common phenomenon in many domains. Relatively few Bayesian Network structure learning algorithms account for missing data, and those that do tend to rely on standard approaches that assume missing data are missing at random, such as the Expectation-Maximisation algorithm. Because missing data are often systematic, there is a need for more pragmatic methods that can effectively deal with data sets containing missing values not missing at random. The absence of approaches that deal with systematic missing data impedes the application of BN structure learning methods to real-world problems where missingness are not random. This paper describes three variants of greedy search structure learning that utilise pairwise deletion and inverse probability weighting to maximally leverage the observed data and to limit potential bias caused by missing values. The first two of the variants can be viewed as sub-versions of the third and best performing variant, but are important in their own in illustrating the successive improvements in learning accuracy. The empirical investigations show that the proposed approach outperforms the commonly used and state-of-the-art Structural EM algorithm, both in terms of learning accuracy and efficiency, as well as both when data are missing at random and not at random

    Information fusion between knowledge and data in Bayesian network structure learning

    Full text link
    Bayesian Networks (BNs) have become a powerful technology for reasoning under uncertainty, particularly in areas that require causal assumptions that enable us to simulate the effect of intervention. The graphical structure of these models can be determined by causal knowledge, learnt from data, or a combination of both. While it seems plausible that the best approach in constructing a causal graph involves combining knowledge with machine learning, this approach remains underused in practice. We implement and evaluate 10 knowledge approaches with application to different case studies and BN structure learning algorithms available in the open-source Bayesys structure learning system. The approaches enable us to specify pre-existing knowledge that can be obtained from heterogeneous sources, to constrain or guide structure learning. Each approach is assessed in terms of structure learning effectiveness and efficiency, including graphical accuracy, model fitting, complexity, and runtime; making this the first paper that provides a comparative evaluation of a wide range of knowledge approaches for BN structure learning. Because the value of knowledge depends on what data are available, we illustrate the results both with limited and big data. While the overall results show that knowledge becomes less important with big data due to higher learning accuracy rendering knowledge less important, some of the knowledge approaches are actually found to be more important with big data. Amongst the main conclusions is the observation that reduced search space obtained from knowledge does not always imply reduced computational complexity, perhaps because the relationships implied by the data and knowledge are in tension

    Effective and efficient structure learning with pruning and model averaging strategies

    Full text link
    Learning the structure of a Bayesian Network (BN) with score-based solutions involves exploring the search space of possible graphs and moving towards the graph that maximises a given objective function. Some algorithms offer exact solutions that guarantee to return the graph with the highest objective score, while others offer approximate solutions in exchange for reduced computational complexity. This paper describes an approximate BN structure learning algorithm, which we call Model Averaging Hill-Climbing (MAHC), that combines two novel strategies with hill-climbing search. The algorithm starts by pruning the search space of graphs, where the pruning strategy can be viewed as an aggressive version of the pruning strategies that are typically applied to combinatorial optimisation structure learning problems. It then performs model averaging in the hill-climbing search process and moves to the neighbouring graph that maximises the objective function, on average, for that neighbouring graph and over all its valid neighbouring graphs. Comparisons with other algorithms spanning different classes of learning suggest that the combination of aggressive pruning with model averaging is both effective and efficient, particularly in the presence of data noise

    STATISTICAL ANALYSIS OF REGIONAL DISTRIBUTION OF FOOTBALL CLUBS IN ENGLISH TOP FLIGHT LEAGUE

    Get PDF
    English football leagues and local competitions are highly followed worldwide. Clubs compete for titles and honors on a yearly basis. English clubs are located in 10 regions of England and Wales. The detailed representation of the clubs according to the regions is rarely discussed. Analysis is often restricted to club by club performances. This article presents the data analysis of regional distribution of football clubs in the English top flight or top division or top level from inception to the premier league era. Generally, there is a significant uneven distribution of club across the regions. The North West and West Midlands regions have been represented by clubs since the inception to date. Data analysis revealed that the early days of the English top level league were dominated by clubs from North West and London, whereas the regions of the East of England, South West and Wales are least represented in the top league. Also, there seems to be a ‘Southern shift’ in recent years with increased representation of clubs from Southern England. However, some balance is now maintained given that all the ten regions have been represented by at least one club over the last three seasons; 2016/17 to 2018/19

    pi-football: A Bayesian network model for forecasting Association Football match outcomes

    Get PDF
    A Bayesian network is a graphical probabilistic belief network that represents the conditional dependencies among uncertain variables, which can be both objective and subjective. We present a Bayesian network model for forecasting Association Football matches in which the subjective variables represent the factors that are important for prediction but which historical data fails to capture. The model (pi-football) was used to generate forecasts about the outcomes of the English Premier League (EPL) matches during season 2010/11 (but is easily extended to any football league). Forecasts were published online at www.pi-football.com prior to the start of each match. In this paper, we demonstrate that a) using an appropriate measure of forecast accuracy, the subjective information improved the model such that posterior forecasts were on par with bookmakers ' performance; b) using a standard profitability measure with discrepancy levels at ≄ 5%, the model generates profit under maximum, mean, and common bookmakers ’ odds, even allowing for the bookmakers ' built-in profit margin. Hence, compared with other published football forecast models, pi-football not only appears to be exceptionally accurate, but it can also be used to 'beat the bookies'

    Hidden dynamics of soccer leagues: the predictive ‘power’ of partial standings

    Get PDF
    Objectives Soccer leagues reflect the partial standings of the teams involved after each round of competition. However, the ability of partial league standings to predict end-of-season position has largely been ignored. Here we analyze historical partial standings from English soccer to understand the mathematics underpinning league performance and evaluate the predictive ‘power’ of partial standings. Methods Match data (1995-2017) from the four senior English leagues was analyzed, together with random match scores generated for hypothetical leagues of equivalent size. For each season the partial standings were computed and Kendall’s normalized tau-distance and Spearman r-values determined. Best-fit power-law and logarithmic functions were applied to the respective tau-distance and Spearman curves, with the ‘goodness-of-fit’ assessed using the R2 value. The predictive ability of the partial standings was evaluated by computing the transition probabilities between the standings at rounds 10, 20 and 30 and the final end-of-season standings for the 22 seasons. The impact of reordering match fixtures was also evaluated. Results All four English leagues behaved similarly, irrespective of the teams involved, with the tau-distance conforming closely to a power law (R2>0.80) and the Spearman r-value obeying a logarithmic function (R2>0.87). The randomized leagues also conformed to a power-law, but had a different shape. In the English leagues, team position relative to end-of-season standing became ‘fixed’ much earlier in the season than was the case with the randomized leagues. In the Premier League, 76.9% of the variance in the final standings was explained by round-10, 87.0% by round-20, and 93.9% by round-30. Reordering of match fixtures appeared to alter the shape of the tau-distance curves. Conclusions All soccer leagues appear to conform to mathematical laws, which constrain the league standings as the season progresses. This means that partial standings can be used to predict end-of-season league position with reasonable accuracy

    The impact of immediate breast reconstruction on the time to delivery of adjuvant therapy: the iBRA-2 study

    Get PDF
    Background: Immediate breast reconstruction (IBR) is routinely offered to improve quality-of-life for women requiring mastectomy, but there are concerns that more complex surgery may delay adjuvant oncological treatments and compromise long-term outcomes. High-quality evidence is lacking. The iBRA-2 study aimed to investigate the impact of IBR on time to adjuvant therapy. Methods: Consecutive women undergoing mastectomy ± IBR for breast cancer July–December, 2016 were included. Patient demographics, operative, oncological and complication data were collected. Time from last definitive cancer surgery to first adjuvant treatment for patients undergoing mastectomy ± IBR were compared and risk factors associated with delays explored. Results: A total of 2540 patients were recruited from 76 centres; 1008 (39.7%) underwent IBR (implant-only [n = 675, 26.6%]; pedicled flaps [n = 105,4.1%] and free-flaps [n = 228, 8.9%]). Complications requiring re-admission or re-operation were significantly more common in patients undergoing IBR than those receiving mastectomy. Adjuvant chemotherapy or radiotherapy was required by 1235 (48.6%) patients. No clinically significant differences were seen in time to adjuvant therapy between patient groups but major complications irrespective of surgery received were significantly associated with treatment delays. Conclusions: IBR does not result in clinically significant delays to adjuvant therapy, but post-operative complications are associated with treatment delays. Strategies to minimise complications, including careful patient selection, are required to improve outcomes for patients

    Prevalence, associated factors and outcomes of pressure injuries in adult intensive care unit patients: the DecubICUs study

    Get PDF
    Funder: European Society of Intensive Care Medicine; doi: http://dx.doi.org/10.13039/501100013347Funder: Flemish Society for Critical Care NursesAbstract: Purpose: Intensive care unit (ICU) patients are particularly susceptible to developing pressure injuries. Epidemiologic data is however unavailable. We aimed to provide an international picture of the extent of pressure injuries and factors associated with ICU-acquired pressure injuries in adult ICU patients. Methods: International 1-day point-prevalence study; follow-up for outcome assessment until hospital discharge (maximum 12 weeks). Factors associated with ICU-acquired pressure injury and hospital mortality were assessed by generalised linear mixed-effects regression analysis. Results: Data from 13,254 patients in 1117 ICUs (90 countries) revealed 6747 pressure injuries; 3997 (59.2%) were ICU-acquired. Overall prevalence was 26.6% (95% confidence interval [CI] 25.9–27.3). ICU-acquired prevalence was 16.2% (95% CI 15.6–16.8). Sacrum (37%) and heels (19.5%) were most affected. Factors independently associated with ICU-acquired pressure injuries were older age, male sex, being underweight, emergency surgery, higher Simplified Acute Physiology Score II, Braden score 3 days, comorbidities (chronic obstructive pulmonary disease, immunodeficiency), organ support (renal replacement, mechanical ventilation on ICU admission), and being in a low or lower-middle income-economy. Gradually increasing associations with mortality were identified for increasing severity of pressure injury: stage I (odds ratio [OR] 1.5; 95% CI 1.2–1.8), stage II (OR 1.6; 95% CI 1.4–1.9), and stage III or worse (OR 2.8; 95% CI 2.3–3.3). Conclusion: Pressure injuries are common in adult ICU patients. ICU-acquired pressure injuries are associated with mainly intrinsic factors and mortality. Optimal care standards, increased awareness, appropriate resource allocation, and further research into optimal prevention are pivotal to tackle this important patient safety threat
    • 

    corecore